Fast forward to the bottom of this document for the map.

Boosted Regression Tree Model

The model uses the EcoDes-DK15 dataset to predict forest quality. Annotations of forest quality were provided by polygons from various Danish agencies:

forest quality annotation source
high §15 forest polygons
high §25 forest polygons
high private old growth
low ikke §25 forests polygons
low NST plantation polygons

The polygons were bagged together in groups by forest quality To provide a balanced training dataset we randomly subsampled the low quality polygons down to n = 10k. The result was a training dataset of ~20k polygons. For each polygon we extracted zonal statistics (mean and sd weighted by cell area) for all EcoDes-DK15 variables.

We split the polygon data into two: 80% for testing and 20% for training. Models were then trained with 10 fold cross validation on the 80% trraining data.

We trained random forest and boosted regression tree models. The resulting models performed similarly. Hyperparameter tuning had little influence on the performance in the validation based on the test data set.

The overall model performance is okay ~76% accuracy, slightly worse than the by-pixel models. See details below:

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction high  low
##       high 1412  465
##       low   419 1507
##                                           
##                Accuracy : 0.7676          
##                  95% CI : (0.7538, 0.7809)
##     No Information Rate : 0.5185          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.5349          
##                                           
##  Mcnemar's Test P-Value : 0.1301          
##                                           
##             Sensitivity : 0.7712          
##             Specificity : 0.7642          
##          Pos Pred Value : 0.7523          
##          Neg Pred Value : 0.7825          
##              Prevalence : 0.4815          
##          Detection Rate : 0.3713          
##    Detection Prevalence : 0.4936          
##       Balanced Accuracy : 0.7677          
##                                           
##        'Positive' Class : high            
## 

Here are the variables that make up the 20 most important predictors:

## gbm variable importance
## 
##   only 20 most important variables shown (out of 78)
## 
##                                    Overall
## normalized_z_sd_mean               100.000
## dtm_10m_mean                        30.645
## normalized_z_sd_sd                  25.606
## canopy_openness_sd                  24.185
## openness_mean_mean                  21.914
## twi_mean                            20.370
## amplitude_sd_sd                     15.871
## solar_radiation_mean                15.589
## aspect_sd                           15.560
## vegetation_proportion_20m.25m_sd    15.077
## aspect_mean                         14.580
## vegetation_proportion_25m.50m_sd    14.434
## twi_sd                              13.240
## heat_load_index_sd                  11.710
## amplitude_mean_sd                   10.861
## amplitude_mean_mean                 10.379
## vegetation_proportion_20m.25m_mean  10.377
## vegetation_proportion_19m.20m_sd    10.223
## vegetation_proportion_19m.20m_mean   9.346
## vegetation_proportion_03m.04m_mean   9.260

Aarhus Region

Here is a projection of the model results for the Aarhus region:

Note: The training data shown is the subset of training polygons within the Aarhus region. The model was trained on a nationwide training data set.